HDFS-16303. Improve handling of datanode lost while decommissioning#3746
HDFS-16303. Improve handling of datanode lost while decommissioning#3746KevinWikant wants to merge 2 commits intoapache:trunkfrom
Conversation
| import java.util.Set; | ||
| import java.util.concurrent.ExecutionException; | ||
|
|
||
| import org.apache.hadoop.fs.Path; |
There was a problem hiding this comment.
will remove the unused imports added to this class in the next revision
|
💔 -1 overall
This message was automatically generated. |
|
@sodonnel The existing test "TestDecommissioningStatus.testDecommissionStatusAfterDNRestart" will be problematic for this change As previously stated, removing the dead DECOMMISSION_INPROGRESS node from the DatanodeAdminManager means that when there are no LowRedundancy blocks the dead node will remain in DECOMMISSION_INPROGRESS rather than transitioning to DECOMMISSIONED This violates the expectation the the unit test is enforcing which is that a dead DECOMMISSION_INPROGRESS node should transition to DECOMMISSIONED when there are no LowRedundancy blocks Therefore, I think this is a good argument to remain more in favor of the original proposed change: #3675 |
|
Closing this PR in favor of this alternate solution: #3675 |
Description of PR
Fixes a bug in Hadoop HDFS where if more than "dfs.namenode.decommission.max.concurrent.tracked.nodes" datanodes are lost while in state decommissioning, then all forward progress towards decommissioning any datanodes (including healthy datanodes) is blocked
JIRA: https://issues.apache.org/jira/browse/HDFS-16303
Additional Details
To solve this HDFS bug, there are 2 different proposals:
These 2 different implementations will largely behave the same from a user perspective. There is however 1 key difference:
How was this patch tested?
3 new unit tests added to both "TestDecommission" & "TestDecommissionWithBackoffMonitor":
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?